Bioinformatics (Thomas Dandekar, Meik Kunz)

Extremely Fast Sequence Comparisons

Identify All the Molecules That Are Present

in the Cell

Abstract

With the BLAST server at NCBI (National Center of Biotechnology Information), you

can get an answer in seconds to a few minutes. This is made possible by fast, but not

entirely accurate, searches. Almost all of the fast bioinformatics programs on the net

use such heuristics. In BLAST, for example, two short but perfect match alignments are

first pretested in a database entry before an exact alignment with the database entry is

performed, thus saving a lot of computing time: indexing the database (after all, you

also look up this book via the table of contents much faster than by browsing through

it). Besides speed, sensitivity (do I recognize all relevant entries?) and specificity (do I

not get too many irrelevant entries?) are also important for a good heuristic search.

How and why do bioinformatic analyses actually work? A very basic step towards under

standing is to understand which biomolecule you have in front of you. For this purpose,

bioinformatics uses the analysis of the molecular sequence. It is important to remember

that we first need the experimentally determined sequence. However, this sequence does

not tell us which molecule is present. However, this can be solved by comparing the

respective molecular sequence with all entries in a database (cf. Chap. 1). The interesting

thing is that bioinformatics has developed very fast computational recipes (algorithms) for

this task. This was necessary because the sequences have grown so quickly that we are

now dealing with many millions of stored sequences and many billions of stored letters.

How do you speed up bioinformatics algorithms so that they can cope with these large

amounts of data?

T. Dandekar, M. Kunz, Bioinformatics,

https://doi.org/10.1007/978-3-662-65036-3_6